NXP Backend: Add imxrt700cm backend which combines the Neutron and CortexM backends#18488
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18488
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled Job, 2 Unrelated FailuresAs of commit ea7648a with merge base 5e77594 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
dda9ddd to
b45f3f7
Compare
| training, the weights will be stored in the file. | ||
| :param train: Boolean indicating whether to train the model. | ||
| :param num_epochs: Number of epochs to use during training. | ||
| :param cortex_m_safe: There is a bug in the Cortex-M backend related to the `pad` operator. If this parameter is |
There was a problem hiding this comment.
Let's fix this if it is something quick as opposed to introducing a new bypass logic? WDYT - CC @rascani since you were discussing this earlier this week in the context of NHWC.
There was a problem hiding this comment.
Good point.
The issue with the pad operator is that calling the pad replacement op in
executorch/backends/cortex_m/passes/quantized_op_fusion_pass.py
Lines 410 to 415 in 28b4813
would produce a contiguous output even when the input had the channels last dim order. I tried to look for the root cause but I didn't find it, so I opted for the bypass and I planned to report the issue after raising this PR.
There was a problem hiding this comment.
Thank you!
I have removed the workaround from our testing model.
| from torchao.quantization.pt2e.quantizer.quantizer import Q_ANNOTATION_KEY | ||
|
|
||
|
|
||
| class IMXRT700CMQuantizer(Quantizer): |
There was a problem hiding this comment.
Quantizers are meant to be composible. Recipe is the right user facing abstraction to target an SoC with multiple different backends. Take a look at https://github.com/pytorch/executorch/blob/main/export/tests/test_target_recipes.py especially something like get_android_recipe to understand how two or more quantizers / partitioners are encapsulated and made to work together.
In your case, I imagine a target recipe for rt700 with neutron and cortex-m.
There was a problem hiding this comment.
Thank you @digantdesai for the insights. I have looked into it, and recipes definitely look like the right way forward.
I analyzed the state in executorch:
-
To introduce an SoC recipe would require having recipes for both Neutron and Cortex-M backend (both missing). Alternatively the current Cortex-M and Neutron pipelines can be combined into a single recipe but from reuse perspective a base recipe for both backend seems better from my opinion. Our Neutron backend pipeline is currently implemented in
-
The Neutron pipeline contains some kernel registration functionality, as only it knows what NPU kernels are requires. This would probably require the creation of a new
Stagetype
Or at least I didn't find any stage providing the functionality to just execute a function based on presence of an option. -
The QAT appears to not be supported. The
QuantizeStageexplicitly states it performs post-training quantization. I see that theSourceTransformStagealso enables quantization in some way, but it doesn't seem QAT is supported. So perhaps this would require another new Stage type (or modification on an existing stage).
Given this, enabling the RT700 Neutron+Cortex-M backend via a recipe requires changes in multiple backends, and this PR would end up quite large. Can we do this in multiple stages? Such as:
- Experimentally, continue with this early implementation introducing the option to combine Cortex-M and Neutron Backends for the i.MXRT700.
- Rework the current Neutron lowering pipeline to a recipe, and the same for the Cortex-M backend. Here we would potentially introduce new Stages.
- Rework the
imxrt700cmlowering to a recipe - Based on consequent discussion extend for QAT training.
For Cortex-M we need to sync with Arm too.
What is your opinion?
…ors. The operator `dim_order_ops._clone_dim_order.default` uses the `kwargs` to determine the output dim order. Since the `kwargs` were always empty, this operator produced in incorrect result in the pass, which broke the rest of the model.
b45f3f7 to
ea7648a
Compare
Summary
Add imxrt700cm backend which combines the Neutron and CortexM backends into one. The backend uses Neutron wherever possible, and the leftover nodes are handled by Cortex-M.
Test plan
Unit tests provided
cc @robert-kalmar @JakeStevens @digantdesai